Robust partitional clustering by outlier and density insensitive seeding

نویسندگان

  • Mohammad Al Hasan
  • Vineet Chaoji
  • Saeed Salem
  • Mohammed J. Zaki
چکیده

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Outlier Robust Nonlinear Mixed Effect Estimation in Examining the Effect of Phenylephrine in Rat Corpus Cavernosum

Ignoring two main characteristics of the concentration-response data, correlation between observations and presence of outliers, may lead to misleading results. Therefore the special method should be considered. In this paper in to examine the effect of phenylephrine in rat Corpus cavernosum, outlier robust nonlinear mixed estimation is used. in this study, eight different doses of phenylephrin...

متن کامل

Robust Estimation in Linear Regression Model: the Density Power Divergence Approach

The minimum density power divergence method provides a robust estimate in the face of a situation where the dataset includes a number of outlier data. In this study, we introduce and use a robust minimum density power divergence estimator to estimate the parameters of the linear regression model and then with some numerical examples of linear regression model, we show the robustness of this est...

متن کامل

C ONSTRAINT BASED P ARTITIONAL C LUSTERING – A C OMPREHENSIVE S TUDY AND A NALYSIS Aparna

Data clustering is the concept of forming predefined number of clusters where the data points within each cluster are very similar to each other and the data points between clusters are dissimilar to each other. The concept of clustering is widely used in various domains like bioinformatics, medical data, imaging, marketing study and crime analysis. The popular types of clustering techniques ar...

متن کامل

On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering

Hybrid clustering combines partitional and hierarchical clustering for computational effectiveness and versatility in cluster shape. In such clustering, a dissimilarity measure plays a crucial role in the hierarchical merging. The dissimilarity measure has great impact on the final clustering, and data-independent properties are needed to choose the right dissimilarity measure for the problem a...

متن کامل

A Detailed Study and Analysis of different Partitional Data Clustering Techniques

The concept of Data Clustering is considered to be very significant in various application areas like text mining, fraud detection, health care, image processing, bioinformatics etc. Due to its application in a variety of domains, various techniques are presented by many research domains in the literature. Data Clustering is one of the important tasks that make up Data Mining. Clustering can be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2009